NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ZipLLM: Efficient LLM Storage via Model-Aware Synergistic Data Deduplication and Compression

Wang, Zirui; Lan, Tingfeng; Su, Zhaoyuan; Yang, Juncheng; Cheng, Yue (May 2026, 23rd USENIX USENIX Symposium on Networked Systems Design and Implementation (NSDI '26))

Modern model hubs, such as Hugging Face, store tens of petabytes of LLMs, with fine-tuned variants vastly outnumbering base models and dominating storage consumption. Existing storage reduction techniques---such as deduplication and compression---are either LLM-oblivious or not compatible with each other, limiting data reduction effectiveness. Our large-scale characterization study across all publicly available Hugging Face LLM repositories reveals several key insights: (1) fine-tuned models within the same family exhibit highly structured, sparse parameter differences suitable for delta compression; (2) bitwise similarity enables LLM family clustering; and (3) tensor-level deduplication is better aligned with model storage workloads, achieving high data reduction with low metadata overhead. Building on these insights, we design BitX, an effective, fast, lossless delta compression algorithm that compresses XORed difference between fine-tuned and base LLMs. We build ZipLLM, a model storage reduction pipeline that unifies tensor-level deduplication and lossless BitX compression. By synergizing deduplication and compression around LLM family clustering, ZipLLM reduces model storage consumption by 54%, over 20% higher than state-of-the-art deduplication and compression approaches.
more » « less
Full Text Available
TokenCompose: Grounding Diffusion with Token-level Supervision

Wang, Zirui; Sha, Zhizhou; Ding, Zheng; Wang, Yilin; Tu, Zhuowen (June 2024, Proceedings)

Full Text Available
Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

https://doi.org/10.14778/3659437.3659456

Su, Zhaoyuan; Ahmed, Ammar; Wang, Zirui; Anwar, Ali; Cheng, Yue (April 2024, Proceedings of the VLDB Endowment)

As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due to a lack of understanding of the patterns and characteristics of these datasets, especially those relevant to data reduction and compressibility. This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility. Our analysis spans different types of data reduction and compression techniques, from hash-based data deduplication, data similarity detection, to dictionary-coding compression. Our analysis explores these techniques at three data granularity levels, from model layers, model chunks, to model parameters. We draw new observations that indicate that modern data reduction tools are not effective when handling PTM datasets. There is a pressing need for new compression methods that take into account PTMs' data characteristics for effective storage reduction. Motivated by our findings, we design Elf, a simple yet effective, error-bounded, lossy floating-point compression method. Elf transforms floating-point parameters in such a way that the common exponent field of the transformed parameters can be completely eliminated to save storage space. We develop Elves, a compression framework that integrates Elf along with several other data reduction methods. Elves uses the most effective method to compress PTMs that exhibit different patterns. Evaluation shows that Elves achieves an overall compression ratio of 1.52×, which is 1.31×, 1.32× and 1.29× higher than a general-purpose compressor (zstd), an error-bounded lossy compressor (SZ3), and the uniform model quantization, respectively, with negligible model accuracy loss.
more » « less
Full Text Available
Precise Surface Profiling at the Nanoscale Enabled by Deep Learning

https://doi.org/10.1021/acs.nanolett.3c04712

Bonagiri, Lalith Krishna; Wang, Zirui; Zhou, Shan; Zhang, Yingjie (February 2024, Nano Letters)

Full Text Available
Language Models as Science Tutors

Chevalier, Alexis; Geng, Jiayi; Wettig, Alexander; Chen, Howard; Mizera, Sebastian; Annala, Toni; Aragon, Max_Jameson; Rodriguez_Fanlo, Arturo; Frieder, Simon; Machado, Simon; et al (May 2024, International Conference on Machine Learning)

Full Text Available
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Wang, Zirui; Tsvetkov, Yulia (September 2021, Proceedings of the International Conference on Learning Representations (ICLR))

Full Text Available
Identifying Reactive Trends in Glycerol Electro-Oxidation Using an Automated Screening Approach: 28 Ways to Electrodeposit an Au Electrocatalyst

https://doi.org/10.1021/acscatal.4c04190

Gaddam, Raghuram; Wang, Zirui; Li, Yichen; Harris, Lauren C; Pence, Michael A; Guerrero, Efren R; Kenis, Paul_J A; Gewirth, Andrew A; Rodríguez-López, Joaquín (January 2025, ACS Catalysis)

Full Text Available
Measurement of exclusive $$J/\psi$$ and $$\psi(2S)$$ production at $$\sqrt{s}=13$$ TeV

https://doi.org/10.21468/SciPostPhys.18.2.071

Collaboration, LHCb; Aaij, Roel; Abdelmotteleb, Ahmed_Sameh Wagih; Abellan_Beteta, Carlos; Abudinèn, Fernando Jesus; Ackernley, Thomas; Adefisoye, Ayomide Matthew; Adeva, Bernardo; Adinolfi, Marco; Adlarson, Patrik Harri; et al (January 2025, SciPost Physics)

Measurements are presented of the cross-section for the central exclusive production ofJ/\psi\to\mu^+\mu^- $J / ψ \to μ^{+} μ^{-}$ and\psi(2S)\to\mu^+\mu^- $ψ (2 S) \to μ^{+} μ^{-}$ processes in proton-proton collisions at\sqrt{s} = 13 \ \mathrm{TeV} $\sqrt{s} = 13 T e V$ with 2016–2018 data. They are performed by requiring both muons to be in the LHCb acceptance (with pseudorapidity2<\eta_{\mu^±} < 4.5 $2 < η_{μ^{\pm}} < 4.5$ ) and mesons in the rapidity range2.0 < y < 4.5 $2.0 < y < 4.5$ . The integrated cross-section results are\sigma_{J/\psi\to\mu^+\mu^-}(2.0 $σ_{J / ψ \to μ^{+} μ^{-}} (2.0 < y_{J / ψ} < 4.5, 2.0 < η_{μ^{\pm}} < 4.5) = 400 \pm 2 \pm 5 \pm 12 p b, σ_{ψ (2 S) \to μ^{+} μ^{-}} (2.0 < y_{ψ (2 S)} < 4.5, 2.0 < η_{μ^{\pm}} < 4.5) = 9.40 \pm 0.15 \pm 0.13 \pm 0.27 p b,$ where the uncertainties are statistical, systematic and due to the luminosity determination. In addition, a measurement of the ratio of\psi(2S) $ψ (2 S)$ andJ/\psi $J / ψ$ cross-sections, at an average photon-proton centre-of-mass energy of1\ \mathrm{TeV} $1 T e V$ , is performed, giving$ = 0.1763 ± 0.0029 ± 0.0008 ± 0.0039,$$ where the first uncertainty is statistical, the second systematic and the third due to the knowledge of the involved branching fractions. For the first time, the dependence of theJ/\psi$ $J / ψ$ and\psi(2S) $ψ (2 S)$ cross-sections on the total transverse momentum transfer is determined inpp $p p$ collisions and is found consistent with the behaviour observed in electron-proton collisions.
more » « less
Full Text Available
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Srivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Shoeb, Abu Awal; Abid, Abubakar; Fisch, Adam; Brown, Adam R.; Santoro, Adam; Gupta, Aditya; Garriga-Alonso, Adri; et al (January 2023, Transactions on machine learning research)

Full Text Available
Operation and performance of the ATLAS semiconductor tracker in LHC Run 2

https://doi.org/10.1088/1748-0221/17/01/P01013

Aad, Georges; Abbott, Brad; Abbott, Dale Charles; Abed Abud, Adam; Abeling, Kira; Abhayasinghe, Deshan Kavishka; Abidi, Syed Haider; Aboulhorma, Asmaa; Abramowicz, Halina; Abreu, Henso; et al (January 2022, Journal of Instrumentation)

Abstract The semiconductor tracker (SCT) is one of the tracking systems for charged particles in the ATLAS detector. It consists of 4088 silicon strip sensor modules.During Run 2 (2015–2018) the Large Hadron Collider delivered an integrated luminosity of 156 fb -1 to the ATLAS experiment at a centre-of-mass proton-proton collision energy of 13 TeV. The instantaneous luminosity and pile-up conditions were far in excess of those assumed in the original design of the SCT detector.Due to improvements to the data acquisition system, the SCT operated stably throughout Run 2.It was available for 99.9% of the integrated luminosity and achieved a data-quality efficiency of 99.85%.Detailed studies have been made of the leakage current in SCT modules and the evolution of the full depletion voltage, which are used to study the impact of radiation damage to the modules.
more » « less
Full Text Available

Search for: All records